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Abstract—Dam reservoir level prediction is important for dam construction, operation, design and safety. ire: el 
EN Pe 


In this study, dam reservoir level change predictions were investigated using the M5 Decision Tree (M5 =, $ Ba 


te es 
Tree) and Adaptive Neural Fuzzy Inference System (ANFIS) models. For modeling the daily dam reservoir "=. ais, 
water level (t), the lagged time of reservoir water level (t-1), stream flow (t) and precipitation heights in the ie cy 
dam basin (t) were used. The model results were compared with the results of conventional multiple linear oR 

a 


of determination (R°), root mean square error (RMSE) and mean absolute error (MAE) performance criteria 


regression (MLR) models. The models were analyzed with graphical and statistical results. The coefficient IO) 


were taken into account when comparing the prediction models. The results showed that M5 Tree and Anfis 


model results gave a better performance in predicting the dam reservoir level change. 
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I. INTRODUCTION 


The contents of each section may be provided to understand 
easily about the paper. Reservoirs and dams are essential to 
the management of water resources. In addition to providing 
water to cities, they are also employed in the production of 
hydroelectric power, flood control, and agricultural 
irrigation. A multipurpose water storage facility must have 
its reservoir or dam level regularly monitored in order to 
make the necessary modifications on time and to ensure 
maximum performance. In the field of water supply 
management, one of the most difficult jobs for planners and 
operators is forecasting water levels. 


Control of water volume in the dam reservoir is achieved by 
accumulating and distributing water at the right time. Due 
to the precautions not taken in time and water-related 
problems, there may be loss of life and property. Therefore, 
proper dam reservoir management is a necessity not only in 
terms of freshwater supply but also in terms of preventing 
possible damages. One of the basic conditions for the most 
effective management of dam reservoirs is to determine the 
dam reservoir water volume and to be able to predict the ups 
and downs in this volume. 
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The first studies to determine the dam reservoir capacity 
were made by Ripple [1] and Sudler [2]. Since those studies, 
many researchers have used classical and traditional 
methods in dam reservoir studies. Sudheer and Jain [3] tried 
to explain the internal behavior of artificial neural networks 
with river flow models. Sudheer [4] tried to create river 
models with information extracted from trained neural 
networks. Üneş [5] and Unes et al [6] tried to determine the 
dam reservoir level changes with artificial intelligence 
techniques. In these methods, the reservoir volume is 
defined as the conservation of mass (continuity equation) at 
the macro scale in hydraulic research systems. In past 
studies on the water level and volume in lakes, the stability 
of the annual level of water was generally used by 
considering the mass-volume methods and statistical 
methods. 


An earlier study used artificial neural networks (ANN) in 
conjunction with tree-based models, including decision 
trees (M5T), random forests (RF), and gradient-boosted 
trees (GB), to predict the dam intake into the Soyang River 
Dam in South Korea [7]. Research showed that an ensemble 
method, which merges the RF/GB forecasts with a 
multilayer perceptron (MLP), might outperform the use of 
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a single individual model. The Upo wetland in South Korea 
serves as another example of the predictive power of tree- 
based approaches. In comparison to ANNs, DTs, and 
support vector machines (SVM), RF was found to have the 
best forecast accuracy [8]. 


To estimate the water level of Lake Erie, other techniques 
such as the Gaussian process (GP), multiple linear 
regression (MLR), and k-nearest neighbor (KNN) have also 
been compared to tree-based and ANN models [9]. Their 
findings demonstrate how machine learning techniques, 
particularly the MLR and M5P model tree, outperformed 
the process-based advanced hydrologic prediction system 
(AHPS) in terms of accuracy and training speed [10]. 


In this study, forecasting models were developed for the 
dam lake water level. In the forecasting models, stream 
flow, precipitation amount falling in the basin and shifted 
lake water level were used as independent variables. M5 
Decision Tree (M5 Tree), which is one of the machine 
learning techniques that show superior performance in 
nonlinear problems, and Adaptive Neuro Fuzzy Logic 
(ANFIS) models, which is a hybrid method working with 
Fuzzy logic algorithm, were used. 


I. MATERIAL AND METHODS 
Study Area 


The study location is Lake Tuscaloosa, which is located 
close to Tuscaloosa, Alabama, USA. (Figure 1) By 
damming the North River, a reservoir known as Lake 
Tuscaloosa was formed in west-central Alabama. Thornton 
Jones built it to supply water to Tuscaloosa citizens as well 
as for industrial purposes. At a cost of around $7,725,000, 
it was finished in 1970. The lake is a popular spot for 
outdoor enjoyment because it's close to Northport and 
Tuscaloosa. When Tuscaloosa's population grew and its two 
existing reservoirs, Harris Lake and Lake Nicol, could no 
longer hold enough water, the city built Lake Tuscaloosa. 
By building a dam on the North River, the region that would 
eventually become Lake Tuscaloosa was flooded. 


Fig.1: Study area 


The data used in this study were obtained by the United 
States Geological Survey (USGS). Streamflow (Q, m3/s), 
precipitation height in the basin (P, cm) and Lake Water 
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Level (LWL, m) variables were used in the estimation 
models. The daily change in the LWL variable of the 
Tuscaloosa reservoir between 2018-2021 is given in Figure 


2: 
E 69.5 
3 
Š 69.0 
s 
4 
a 
= 68.5 
68.0 
67.5 
67.0 
0 100 200 300 400 500 600 700 800 900 1000 1100 
Days 
Fig.2: Daily lake water level change 
Methods 


Multi Linear Regression (MLR) 


Multiple Linear Regression analyses are among the 
methods used to model the relationship between two or 
more variables according to the cause-effect relationship. If 
a single independent variable is used as an input in the 
model established to estimate the dependent variable, it is 
called single regression, and if more than one independent 
variable is used, it is called multiple regression analysis. In 
the MLR method, the effect of independent variables on 
dependent variables is expressed with the regression 
coefficient in the equation. This coefficient shows the 
degree of effect of independent variables on the dependent 
variable in the regression equation. Multiple Linear 
Regression is given in Equation 1 


Y; = (Bo + BX, + B2X> ai BrXn) + €; (1) 


This equation contains linear expressions. In this equation, 
Xi (i = 1, ..., n) independent variables, Yi dependent 
variable, P regression coefficient and e represents the error. 


M5 Decision Tree (M5 Tree) 


M5 Tree was first proposed by Quinlan [11] This method 
results in the estimated value of the dependent variable in a 
fast, practical and understandable way. It is a versatile 
logical model. It is a guide on how to deal with numerical 
data and missing data values. It is quite fast and produces 
understandable outputs that are very accurate at very high 
rates. This situation is explained by the robust and versatile 
operation of decision tree learning that can cope with the 
demands of real-world data sets. (Witten et al. [12]). The 
MST algorithm creates a regression series by repeatedly 
dividing the sample space using tests on a single feature that 
maximizes the variance in the target space. The 
mathematical equation for calculating the standard 
deviation reduction (SDR) is given in Equation 2 
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= ar) FEl ear 
SDR = sd(T) -E $ sd(T;) " 


Adaptive Neuro Fuzzy Inference System (ANFIS) 


An adaptive network-based fuzzy inference system 
(ANFIS) is used as an artificial neural network method 
based on a fuzzy inference system. ANFIS model was 
developed by Jang since the early 1990s and is used in 
modeling nonlinear functions and estimating chaotic time 
series [13-14]. ANFIS consists of nodes directly connected 
and each node represents a processing unit [15]. Since 
ANFIS uses both artificial neural networks and fuzzy logic 
inference methods, it uses a hybrid learning algorithm [16]. 
There are two approaches to fuzzy inference systems. These 
approaches are the approach of Mamdani and Assilian, 
Takagi and Sugeno [17]. To apply an adaptive neuro-fuzzy 
inference system (ANFIS), data sets with input and output 
are generally needed. The ANFIS method finds the best 
values for the membership functions of fuzzy sets by 
training the model with the principle of reducing errors. It 
also creates fuzzy rules for FIS. The structure of the 
Adaptive Neural Inference System (ANFIS) is shown in 
Figure 2. Here; "x, y, z, t" are our independent variables, 
"al, a2, bl, b2, cl, c2, dl, d2" are the input parameters, "Į Į 
(pi)" are the membership functions, "N" are the rules and 
"wi" are the weights of the parameters. 


Kamasl Kstmsa? Kstmse3 Kimsa 4 Karman $ 


Fig.4: ANFIS model with four inputs and one output. 


In Figure 4; in the 1st layer, the membership function is 
selected, and the membership levels of the linguistic 
variables are determined. In the ANFIS model of this study, 
the number of membership functions is two for each 
independent variable. In the 2nd layer, all nodes in the 
second layer are fixed nodes indicated by the symbol "JJ". 
The products of the outputs of the first layer represent the 
resulting fuzzy rules. In the 3rd layer, here too, the nodes in 
the layer are fixed nodes and indicated by the symbol "N". 
ANFIS normalizes the values in the network structure. 
These values are taken as output. In the 4th layer, all nodes 
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in this layer are normalized nodes and the weight values (w) 
coming from the third layer are multiplied by the first- 
degree polynomial equation. "w1*f1" is the layer output. In 
the 5th layer, there is only one fixed node in this layer. It 
gives the total result of all the operations coming as "È". 


I. RESULTS 


In the model analysis, the first 75% of the total data set 
(1091) was used as training data and the last 25% (273) as 
test data. For the 273-day test data, MLR, M5 Tree and 
Anfis model's performances were evaluated using statistical 
criteria (RMSE, MAE and R°). For each model, mean 
absolute error (MAE), root mean square error (RMSE), and 
coefficients of determination (R*) between model 
predictions and measured values were used and the 
Statistical criteria used are given in the equations below. 
Table 1 shows model performance comparisons as a result 
of the analysis. 


1 2 
RMSE = i A LW Lmeasurement = LWLorediction) 
(3) 


1 
MAE = W Ma |LWLmeasurement = LW Lpreaiction 
(4) 


Table 1. Error information and correlation changes of the 


models. 

Model Model MAE RMSE R? 
Inputs (m) (m) 

MLR Q(t), P(t) ve 0.024 0.030 0.957 
GSS(t-1) 

M5 Tree Q(t), P(t) ve 0.010 0.018 0.965 
GSS(t-1) 

Anfis Q(t), P(t) ve 0.009 0.016 0.971 
GSS(t-1) 

MLR Results 


In the MLR model, stream flow rate (Q(t), m/s), 
precipitation height in the basin (P(t), cm) and offset lake 
water level (LWL(t-1), m) parameters were used in LWL 
estimation. The results of the test phase of the MLR method 
are given as distribution and scatter graphs in Figures 5-6, 
respectively. 
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Fig.6: Distribution graph of MLR model 


According to the scatter graph (Figure 6) and Table 1, it was 
seen that the coefficient of determination obtained was R? = 
0.957. When the MLR model in the test phase was 
examined, it was determined that it had the lowest 
determination value. It was determined that some peak 
LWL amounts gave lower estimates than the actual LWL 
values in the MLR model. Therefore, it is seen that there is 
a decrease in the determination values. 


M5 Tree Results 


In the M5 Tree model (as in the MLR method), LWL was 
estimated using the stream flow rate (Q(t), m/s), 
precipitation height in the basin (P(t), cm) and offset Lake 
Water Level (LWL(t-1), m) parameters. The distribution 
and scatter graphs in the test phase results of the M5 Tree 
model are given in Figures 7-8, respectively. 
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Fig.7: Scatter graph of M5 Tree model 
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Fig.8: Distribution graph of MLR model 


According to the distribution and scatter graphs given in 
Figures 7 and 8, it was obtained that there was a good 
agreement between the real LWL and M5 Tree estimation 
results. It was seen from Table 1 and Figure 8 that the 
coefficient of determination R? = 0.965. The M5 Tree 
method performed better than the MLR method in LWL 
estimation. 

Anfis Results 

In the Anfis model (as in the MLR and M5 Tree models), 
the LWL was estimated using the stream flow rate (Q(t), 
m3/s), precipitation height in the basin (P(t), cm) and offset 
Lake Water Level (LWL(t-1), m) parameters. The 
distribution and scatter graphs in the test phase results of the 
Anfis model are given in Figures 9-10, respectively 
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Fig.10: Distribution graph of Anfis model 


According to the distribution graph, it was obtained that 
there was a harmony between the real results and the Anfis 
estimation results. When Figure 10 and Table 1 were 
examined, it was seen that the determination coefficient 
obtained was R? = 0.971. It was determined that the Anfis 
model generally gave estimates closer to the LWL peak 
values. Therefore, it was seen that there was an increase in 
the determination values compared to other methods. It was 
determined that the Anfis results had the best estimation 
performance for LWL estimations. The results of the Anfis 
estimation values of the real-time LWL showed better 
performance than the other model estimates and good 
estimation results were observed according to the real 
values. When we look at the MAE, RMSE and R? shown in 
Table 1, the Anfis (0.009; 0.016; 0.971) model showed the 
best performance compared to the other models. 
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IV. CONCLUSION 


For the purpose of designing and building lakeshore 
constructions, other industrial operations, and integrated 
water resources management, it is critical to predict 
fluctuations in dam reservoir levels. The current study used 
MLR, M5 Tree, and Anfis models to anticipate dam 
reservoir Tuscaloosa lake level in the United States. For the 
performance evaluation of multiple linear regression 
(MLR), M5 decision tree (M5 Tree) and adaptive network- 
based fuzzy inference system (ANFIS) models, coefficient 
of determination (R*), mean absolute error (MAE), and root 
mean square error (RMSE) were calculated. From the study, 
the following conclusions can be made. 


As a result of the created models’ performance 
evaluation, all models successfully estimated the reservoir 
lake level. The results of the MLR method and the M5Tree 
method showed similar results. 


It was seen that the adaptive network-based fuzzy inference 
system (ANFIS) model was more successful than the other 
three models due to its lower error values and high 
coefficient of determination. Compared to the traditional 
models, the proposed Anfis model yields more accurate 
estimations of the fluctuations in the reservoir level. 
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